02. Lesson Outline

Lesson Outline

Data wrangling process:

  • Gather (this lesson)
  • Assess
  • Clean

Gathering data is the first step in data wrangling. Before gathering, we have no data, and after it, we do.

Gathering data varies from project to project. Sometimes you're just given data, or pointed to it like I've done for you throughout this course. Sometimes you need to search for the right data for your project. Sometimes the data you need isn't readily available, and you need to generate it yourself somehow. When you do find your data, it's not unusual for it to be spread across several different sources and file formats, which makes things tricky when organizing the data in your programming environment.

For these reasons and more, gathering can be tricky. In this lesson, which is likely the most technically challenging lesson of the course, you'll acquire the coding skills and general craftiness required to conquer the vast majority of gathering scenarios you'll come across in the future. This is going to be hard sometimes, and that's okay. Stick with it and don't hesitate to reach out for help.

This lesson will be structured as follows:

  • First, we'll pose a few questions.
  • Then you'll explore the source of each piece of data we need to answer those questions, each piece from a different source and in a different format.
  • Then you'll learn about the structure of each file format.
  • Then you'll learn how to handle that file format using Python and its libraries.
  • Then you'll actually gather each piece of data to later join together to create your master dataset.